| Employee | Department | DeptScale | Status | StatusNom | StressLevel | HighestDegreeEarned | YearsofExperienceasof2016 | 2012HourlyWage | 2013HourlyWage | 2014HourlyWage | 2015HourlyWage | 2016HourlyWage |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 31952 | Staff Dev | 0 | FT | 1 | 0 | Bachelor | 9 | 25.54 | 25.56 | 25.72 | 26.80 | 27.62 |
| 33540 | Staff Dev | 0 | FT | 1 | 0 | Master | 13 | 27.71 | 28.31 | 29.37 | 29.85 | 30.05 |
| 38274 | Staff Dev | 0 | FT | 1 | 0 | Diploma | 11 | 24.14 | 24.58 | 24.74 | 26.14 | 27.06 |
| 32490 | Staff Dev | 0 | FT | 1 | 0 | Associate | 10 | 25.09 | 25.11 | 25.17 | 26.37 | 27.53 |
| 34803 | Staff Dev | 0 | FT | 1 | 0 | Bachelor | 10 | 25.84 | 26.16 | 27.08 | 27.34 | 29.02 |
| 32915 | Staff Dev | 0 | FT | 1 | 0 | Associate | 27 | 29.08 | 29.14 | 29.30 | 29.70 | 30.62 |
| 37771 | Staff Dev | 0 | FT | 1 | 0 | Associate | 16 | 26.08 | 26.44 | 26.44 | 26.76 | 27.10 |
| 35608 | Staff Dev | 0 | FT | 1 | 0 | Associate | 7 | 23.92 | 24.60 | 24.86 | 25.10 | 25.34 |
| 37052 | Inf Cont | 0 | FT | 1 | 0 | Bachelor | 15 | 26.67 | 26.96 | 26.97 | 27.07 | 27.08 |
| 38169 | Inf Cont | 0 | FT | 1 | 0 | Associate | 26 | 28.56 | 28.59 | 29.28 | 29.77 | 30.63 |
Unit 2 - Confidence Intervals
Ma 340 Applied Statistics
Data Import: Nursing Salary Data
Estimating Population Parameters
- Point Estimate: The sample statistic that is the best point estimate (or single value estimate) of the population parameter.
- Confidence Interval: An interval estimate of the true value of a population parameter. Abbreviated CI.
- Confidence Level: The probability 1-\(\alpha\) that the confidence interval actually does contain the population parameter, assuming that the estimation process is repeated a large number of times.
- Margin of Error: The maximum expected difference between a sample statistic and the actual population parameter at a given confidence level.
\[Confidence \ Interval = Point\ Estimate \pm Margin\ of\ Error = (Point\ Estimate−MoE,Point\ Estimate+MoE)\]
Proportions
Point Estimate
- What is the best point estimate for the population proportion?
# Sample Proportion
pe = length(which(NursesWages$HighestDegreeEarned == "Master"))/nrow(NursesWages)The best point estimate for the proportion of all nurses with Master degrees in 2016 is the sample proportion, \(\hat{p}\) = 0.105, from the 286 nurses randomly sampled.
Margin of Error
- What is the appropriate Margin of Error for \(\hat{p}\)?
- What is the most likely spread of \(\hat{p}\)’s relative to \(p\)?
Recall - Normal Approximation of the Binomial
Consider the distribution of \(\hat{p}\):
\[ \hat{p} \sim N\!\left(p,\; \frac{p(1-p)}{n}\right)\]
For \(p=0.6\),
More generally, for any \(p\), \[Z = \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}} \sim N(0,1)\]
Since, \[ -z_{\alpha/2} < Z < z_{\alpha/2} \]
\[ -z_{\alpha/2} < \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}} < z_{\alpha/2} \]
\[ -z_{\alpha/2}*\sqrt{\frac{p(1-p)}{n}} < \hat{p} - p < z_{\alpha/2}*\sqrt{\frac{p(1-p)}{n}} \]
\[ -\hat{p}-z_{\alpha/2}*\sqrt{\frac{p(1-p)}{n}} < - p < -\hat{p} + z_{\alpha/2}*\sqrt{\frac{p(1-p)}{n}} \]
\[ \hat{p}-z_{\alpha/2}*\sqrt{\frac{p(1-p)}{n}} < p < \hat{p} + z_{\alpha/2}*\sqrt{\frac{p(1-p)}{n}} \]
Thus, \[ MoE = z_{\alpha/2}*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]
Confidence Interval
\[ \hat{p}-z_{\alpha/2}*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} < p < \hat{p} + z_{\alpha/2}*\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]
\[ \hat{p}-MoE < p < \hat{p} + MoE \]
Example:
Check Assumptions:
nrow(NursesWages)*pe [1] 30
nrow(NursesWages)*(1-pe) [1] 256
Margin of Error
for our sample of 286 nurses is approximaterly,
alpha = .05
MoE = -qnorm(alpha/2)*sqrt(pe*(1-pe)/nrow(NursesWages))
MoE[1] 0.03551237
Confidence Interval
UB = pe + MoE
LB = pe - MoEWith 95% confidence, the actual percentage of nurses with master degrees is between 6.94% and 14.04% in 2016.
Base R Tools
x = length(which(NursesWages$HighestDegreeEarned == "Master"))
n = nrow(NursesWages)
output = binom.test(x, n, conf.level = 0.95)
output$conf.int[1] 0.07189903 0.14635036
attr(,"conf.level")
[1] 0.95
Means
Point Estimate
What is the best point estimate for the population mean?
# Sample Mean
pe = mean(NursesWages$`2016HourlyWage`)The best point estimate for all hourly wages of nurses in 2016 is the sample mean of $27.32 from the 286 nurses randomly sampled.
Margin of Error
- What is the appropriate Margin of Error for \(\bar{x}\)?
- What is the most likely spread of \(\bar{x}\)’s relative to \(\mu\)?
\[Z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}} \sim N(0,1)\] \[\text{Margin of Error} = z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}\]
What is \(\sigma\)?
Student t distribution
\[t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}}; \ \ \ df = n-1\] So, \[\text{Margin of Error} = t_{\alpha/2,\, n-1} \cdot \frac{s}{\sqrt{n}}\]
Confidence Interval
\[ \bar{x}-MoE < \mu < \bar{x} + MoE \]
Example:
Check Assumptions:
nrow(NursesWages) > 30[1] TRUE
Margin of Error
For our sample of 286 nurses, Sigma is unknown.
alpha = .05
MoE = -qt(alpha/2, nrow(NursesWages)-1) * sd(NursesWages$`2016HourlyWage`) / sqrt(nrow(NursesWages))
MoE[1] 0.2119204
Confidence Interval
UB = pe + MoE
LB = pe - MoEWith 95% confidence, the actual average hourly salary of nurses is between $27.1 and $27.53 in 2016.
Base R Tools
?t.test
output = t.test(NursesWages$`2016HourlyWage`, conf.level = 0.95)
output$conf.int[1] 27.10441 27.52825
attr(,"conf.level")
[1] 0.95